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With the growing success of the social Web, most Web de- 
velopers have to interact with at least one social Web plat- 
form, which implies studying the related API specifications. 
These are often only informally described, may contain er- 
rors, lack harmonization, and generally speaking make the 
developer's work difficult. Most attempts to solve this prob- 
lem, proposing formal description languages for Web service 
APIs, have had limited success outside of B2B applications; 
we believe it is due to their top-down nature. In addition, a 
programmer dealing with one or several of these APIs has 
to deal with a number of related tasks such as data inte- 
gration, requests chaining, or policy management, that are 
cumbersome to implement. Inspired by the SPORE project, 
we present API Blender, an open-source solution to de- 
scribe, interact with, and integrate the most common so- 
cial Web APIs. In this perspective, we first introduce two 
new lightweight description formats for requests and services 
and demonstrate their relevance with respect to current plat- 
form APIs. We present our Python implementation of API 
Blender and its features regarding authentication, policy 
management and multi-platform data integration. 

Categories and Subject Descriptors 

H. 3.5 [Information Storage and Retrieval]: Online In- 
formation Services — Web-based services; D.3.2 [Programming 
Languages]: Language Classifications — Python 

General Terms 

Design, Standardization 

Keywords 
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I. INTRODUCTION 

Interacting with platforms like Facebook, Youtube, Twit- 
ter, Flickr, or Google+ becomes an important part of many 



software projects, whether it is for authentication purposes, 
to collect information about a user, to present mash-ups 
of popular social Web data, or for a myriad of other rea- 
sons. Our perspective comes from the need of archiving 
important social data for preservation purposes^ Regular 
Web archives, such as those built by the Internet Archivfl 
often include content from or pointers to social Web plat- 
forms but do not benefit from API data. As a consequence, 
the archives are either partial - Facebook disallows generic 
crawling of its public pages - or lack some extra information 
that the API can provide, for instance extracted entities on 
Twitter. Designing an archival crawler for the social Web re- 
quires interfacing with the multiple social Web APIs, as well 
as respecting the policies imposed by these services, such as 
limiting the number of requests per hour. 

Many projects thus involve numerous interactions with 
various social platforms, sometimes with complex logics such 
as getting the social graph till the third rank of users having 
mentioned a specific keyword. Understanding the related 
API specifications can be challenging. There is no de facto 
standard to describe them and they can contain mistakes or 
approximations. There is no clear specification, for instance, 
of how many requests per hour are allowed on the Twitter 
search API. For the most popular platforms, specific lan- 
guage libraries sometimes exist but they often require the 
same learning phase. 

Having a unified description of the different social Web 
APIs is a technical challenge. An early step was taken with 
WSDL [2], a Web Services Description Language standard- 
ized by the W3C. WSDL has been heavily used in the in- 
dustry and is at the core of many service-oriented software 
projects [5]. However, most popular social platforms includ- 
ing Facebook, Twitter, or Google+ and many other Web 
services are not currently offering any WSDL description of 
their API and do not seem to have any plans to do so. The 
reasons are manifold: WSDL-based services are often con- 
sidered heavy machinery for such lightweight interfaces [TJ, 
WSDL has historically focused on SOAP message exchanges 
rather than on RESTful APIs though it can now express 
both [10] . WSDL has no support for important API meta- 
data such as policy management or the description of a se- 
quence of service callfl In reaction to WSDL, some other 
approaches to Web services description have been proposed, 
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a prime example being WADL [6] but they have not met 
with more success on popular social Web platforms. 

Another perspective is necessary. Spring SociaQ is a Java 
framework to interact with the different social platforms. 
We believe this bottom-up approach is a very promising 
way to make the developers' work simpler. Spring Social 
implements a number of useful functionalities (authentica- 
tion, uniform interface to some of the API types, etc.) but 
does not fulfill our requirements. On the one hand, some 
important features, especially for archival crawling, are not 
considered, such as limits on the number of requests. On 
the other hand, using Spring Social requires understanding 
an important amount of code before being able to interact 
with a social platform. To give an order of magnitude of the 
size of the software, the core vl.02 contains 405 files, with- 
out implementing any Web API0 With API Blender, we 
aim at more simplicity and flexibility, as highlighted by the 
example of use we give in Section [3] 

Our main source of inspiration has been the SPORE project [4] . 
It consists in a simple implementation-agnostic JSON for- 
mat allowing to describe Web APIs designed according to 
the REST principles. The project has been started recently 
and is still under development. 

With API Blender, we extend SPORE with the follow- 
ing contributions: 

1. two simple description formats at the API and request 
levels, adapted to social platforms, sorting SPORE out 
and complementing it; 

2. an open Python implementation, allowing to easily in- 
tegrate various platforms; 

3. the following features, some of them left out of existing 
tools or libraries: authentication, server policy man- 
agement, multi-platform data integration, and request 
chaining. 

We designed API Blender inspired by what we observed 
on five prominent social platforms we identified: Twitter, 
Facebook, Google+, FlickR and Youtube. However, we 
strove at keeping a high flexibility so that it can be extended 
to many other Web APIs. 

Our article is organized as follows. In Section[2l we present 
descriptions formats and discuss their relevance to social 
platforms. We then detail in Section [3] our implementation 
in Python and its features. 

2. DESCRIPTION FORMATS 

A Web API consists in a set of HTTP request messages as- 
sociated to responses, sent to a specific HTTP server having 
its own rules. Note that Twitter has different APIs corre- 
sponding to different hosts: for instance, api . twitter . com : 80 
or search. twitter . com: 80 We describe a Web API with 
several objects that allow to describe the server and its rules 
(with respect to access policies) as well as the interactions 
it offers. We find JSON ,S[ light and readable and have 
chosen to use it as a serialization. In what follows, we tried 
using straightforward names and self-explaining conventions 
to define the different elements. 



tion of authentication and policy usually required to interact 
with social platform Web APIs. 

Server Object 

name " : string , 
host " : string , 
port " : integer , 
authentication": auth_object, 
policy": policy_object , 
interactions" : [ int er act i on_ob j ect ] 

Port, policy, and authentication are optional. The port de- 
faults to 80. 

Two authentication protocols are supported at the mo- 
ment, one based on a unique authentication URL with pa- 
rameters and the other on the three-legged OAuth2 [7]. 

Simple Authentication Object 

" request_token_url " : uri , 
" url_parameters " : object 

By simple authentication, we mean authentication with pa- 
rameters such as API key or login and password passed to 
a unique URL so as to receive the authentication token. 

OAuth2 Authentication Object 

consumer _key " : string, 
consumer_secret " : string, 
request_token_url " : uri, 
access_token_url " : uri, 
author ize_url " : uri 

Many social platforms (e.g., Twitter, Facebook, Google+) 
accept OAuth2 authentication. 

Policy Object 

" r eque st s_per _hour " : integer, 

" too_many_calls_response_code " : integer , 

" t oo_many _ call s_wai t ing_ se conds " : integer 

An overload can be detected by counting the requests or 
receiving a too-many-calls response. In the latter case, API 
Blender will snooze for the specified amount of time before 
testing if the counter has been reset. 

Interaction description format. An interaction is a class 
of HTTP requests with a common root path and their asso- 
ciated responses. Here also we extended SPORE and added 
the response object. 

Interaction Object 

" name " : string , 
"description": string, 
"request": request_obj ect , 
"response": response_obj ect 



The description is optional. 



Server description format. We have extended SPORE with 
a consistent oriented-object approach, as well as the addi- 
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Request Object 



'root_path": string, 
' method " : string 
' r aw_cont ent " : string 
' url_parameters " : [ 
[ string, # key, e.g. 



' id' 



string, # type, e.g., "integer" 

boolean # is it an optional parameter? 

object # the default value , it can be null ] 

] 



The method has to be GET, PUT, POST or DELETE. Pro- 
viding raw content is optional and useful only for PUT and 
POST methods. If a default value is set on a URL param- 
eter, it will be automatically passed with the default value 
unless it is explicitly set as null. This feature can be use- 
ful in many case such as requesting a default value of 100 
responses per pages for a full-text query on Twitter search 
API. 

Response Object 

"expected_status_code": integer , 
" serial izat i on_f ormat " : ser i al izat i on_f ormat , 
" expected_schema" : j son_schema_obj ect , 
"integration": extractor_object 



The expected code is optional and defaults to 200. The se- 
rialization format has to be JSON or XML at the moment. 
The expected schema of the response is optional and can be 
defined as a JSON schema [5]. At the moment, we define 
a simple extractor that allows a mapping between a unified 
model and response fields. We use '.' as a path separator. 
For instance, we could have "post. content" : "post_data.text" 
if our integrated model was {"post": {"content": string}} 
with a response model of {"post_data" : {"text": string}}. 
With a careful normalization model (for instance using con- 
cepts of an ontology), this allows to integrate data coming 
from different platforms. As an extension, this semantic 
model could also be used to describe the inputs of services, 
a first step towards semantic service orchestration. 

3. THE PYTHON IMPLEMENTATION 

Python is becoming increasingly popular among develop- 
ers. On the social coding platform GitHub, it is ranked 
thirdQ We find Python to be simple, flexible, and to have 
many useful libraries. We have chosen to implementation 
API Blender in this language. API Blender is available 
online at https://github.com/netiru/apiblender 

Structure. The module structure offered by Python allows 
us to adopt the following light structure. 



API Blender package 



main . py 


Controller 




server . py 


Server and 


interactions 


policy . py 


Policy mana 


gement 


auth . py 


Aut hent if i c 


ation management 


conf ig / 


JSQN config 


uration files 


--general . j son 


General con 


fig 


-- api s / 


API config 


files 



We found it convenient to have one file per API server where 
we gather the descriptions for the server and its interac- 
tions. Currently, the API Blender supports the two Twitter 
APIs (generic and search), Facebook, Google+, FlickR and 
Youtube. 
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https : //github. com/languages 



Features. API Blender implements several precious fea- 
ture. It supports the two main authentication types: using 
a single URL with parameters and OAuth2 [7] thanks to 
Python OAuthfl 

API Blender also ensures respect of the server policy; 
when the hourly limit is reached or when a too-many-calls 
response is identified, the policy manager will stop for some 
time and periodically test if the counter has been reset. Er- 
ror handling is taken into consideration too, whether it re- 
gards a non-conforming configuration file or an unexpected 
response. Finally, API Blender gives the possibility to ex- 
tract and normalize elements from responses. This feature 
supports simple field extraction and standardization at the 
moment but the same process will be possible with arbitrary 
subtree transformations in the near future. 

Request chaining. The open nature of API Blender com- 
bined to the flexibility of Python can fill many needs. Re- 
quest chaining becomes very simple with Python and com- 
plex interactions can become easy-to-maintain Python li- 
braries. We illustrate this with the following example on 
two Twitter APIs. The program below retrieves the last 
three pages of tweets containing the keywords "good spirit" 
then fetches the local social network (followers and followees) 
of the authors of the tweets. 



Example of request chaining with Python 

import apiblender 

blender = apiblender . Blender ( ) 

# Retrieving 3 pages of result 
blender. load_server ("twitter-search") 
blender . load_interaction(" search") 
users = set () 

for p in range (1,3): 

blender . set_parameters ({ "q" : "good spirit", 

"page": p}) 
response = blender . blend ( ) 

ts=response ["prepared_content"] ["results"] : 

for twitt in ts 

users . add (twitt ["from_user "] ) 

# Retrieving followers / followees for each user 
blender . load_server ("twitter -generic ") 

for user in users : 

blender . lo ad _interaction(" followers ") 
blender . set_parameters({"screen_name " :user}) 
followers = blender . blend ( ) 

blender . lo ad _interaction(" followees ") 
blender . set_parameters({"screen_name " :user}) 
followees = blender . blend ( ) 

# Printing everything 

print ("User Name: %s" user) 

print (" \tFollowers : °/,s" */, \ 

followers ["prepared_content"]) 
print (" \tFollowees : °/,s" */, \ 

followees ["prepared_content"]) 



Created and maintain ed by Sim pleGeo Inc. 
https : //github. com/simplegeo/python-oauth2 



4. CONCLUSIONS [3] 

API Blender has been designed in the context of the 
ARCOMEM project on social Web archiving, and is put to 
use in this project to crawl and integrate data from various 
social Web platforms. We have found its flexibility useful [4] 
in the light of the dynamicity of social Web platforms and 
managed to conveniently integrate the five platforms cur- 
rently supported: Twitter, Facebook, FlickR, Google+, and 
Youtube. It is of potential use in any application that needs [5] 
to access similar REST-inspired Web APIs and to export re- 
sponses in a common schema. The source code being avail- 
able on GitHub, we hope to solicit contributions, either in [6] 
the form of extensions of the base functionalities, or in that 
of API descriptions. For future work, we see many promising 
opportunities such as: [7] 

1. smarter processing of responses, making use of the se- 
mantics of the services described, in the spirit of the 
semantic description of Web services a la OWL-S 

2. developing more standard request chaining libraries; rg] 

3. a possible integration of the different input schemas; 

4. more research for a smarter snooze management; 

5. distributing requests across different servers. 

They all require to be very conscious of the existing trade-off jgj 
between completeness and flexibility. 
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